102 research outputs found
The Hitchhiker's Guide to Facebook Web Tracking with Invisible Pixels and Click IDs
Over the past years, advertisement companies have used various tracking
methods to persistently track users across the web. Such tracking methods
usually include first and third-party cookies, cookie synchronization, as well
as a variety of fingerprinting mechanisms. Facebook (FB) recently introduced a
new tagging mechanism that attaches a one-time tag as a URL parameter (FBCLID)
on outgoing links to other websites. Although such a tag does not seem to have
enough information to persistently track users, we demonstrate that despite its
ephemeral nature, when combined with FB Pixel, it can aid in persistently
monitoring user browsing behavior across i) different websites, ii) different
actions on each website, iii) time, i.e., both in the past as well as in the
future. We refer to this online monitoring of users as FB web tracking. We find
that FB Pixel tracks a wide range of user activities on websites with alarming
detail, especially on websites classified as sensitive categories under GDPR.
Also, we show how the FBCLID tag can be used to match, and thus de-anonymize,
activities of online users performed in the distant past (even before those
users had a FB account) tracked by FB Pixel. In fact, by combining this tag
with cookies that have rolling expiration dates, FB can also keep track of
users' browsing activities in the future as well. Our experimental results
suggest that 23% of the 10k most popular websites have adopted this technology,
and can contribute to this activity tracking on the web. Furthermore, our
longitudinal study shows that this type of user activity tracking can go as far
back as 2015. Simply said, if a user creates for the first time a FB account
today, FB could, under some conditions, match their anonymously collected past
web browsing activity to their newly created FB profile, from as far back as
2015 and continue tracking their activity in the future
FNDaaS: Content-agnostic Detection of Fake News sites
Automatic fake news detection is a challenging problem in misinformation
spreading, and it has tremendous real-world political and social impacts. Past
studies have proposed machine learning-based methods for detecting such fake
news, focusing on different properties of the published news articles, such as
linguistic characteristics of the actual content, which however have
limitations due to the apparent language barriers. Departing from such efforts,
we propose FNDaaS, the first automatic, content-agnostic fake news detection
method, that considers new and unstudied features such as network and
structural characteristics per news website. This method can be enforced
as-a-Service, either at the ISP-side for easier scalability and maintenance, or
user-side for better end-user privacy. We demonstrate the efficacy of our
method using data crawled from existing lists of 637 fake and 1183 real news
websites, and by building and testing a proof of concept system that
materializes our proposal. Our analysis of data collected from these websites
shows that the vast majority of fake news domains are very young and appear to
have lower time periods of an IP associated with their domain than real news
ones. By conducting various experiments with machine learning classifiers, we
demonstrate that FNDaaS can achieve an AUC score of up to 0.967 on past sites,
and up to 77-92% accuracy on newly-flagged ones
Shadow Honeypots
We present Shadow Honeypots, a novel hybrid architecture that combines the best features of honeypots and anomaly detection. At a high level, we use a variety of anomaly detectors to monitor all traffic to a protected network or service. Traffic that is considered anomalous is processed by a "shadow honeypot" to determine the accuracy of the anomaly prediction. The shadow is an instance of the protected software that shares all internal state with a regular ("production") instance of the application, and is instrumented to detect potential attacks. Attacks against the shadow are caught, and any incurred state changes are discarded. Legitimate traffic that was misclassified will be validated by the shadow and will be handled correctly by the system transparently to the end user. The outcome of processing a request by the shadow is used to filter future attack instances and could be used to update the anomaly detector. Our architecture allows system designers to fine-tune systems for performance, since false positives will be filtered by the shadow. We demonstrate the feasibility of our approach in a proof-of-concept implementation of the Shadow Honeypot architecture for the Apache web server and the Mozilla Firefox browser. We show that despite a considerable overhead in the instrumentation of the shadow honeypot (up to 20% for Apache), the overall impact on the system is diminished by the ability to minimize the rate of false-positives
Speeding up TCP/IP: Faster Processors are not Enough
Over the last decade we have been witnessing a tremendous increase in the capacities of our computation and communication systems. On the one hand, processor speeds have been increasing exponentially, doubling every 18 months or so, while network bandwidth, has followed a similar (if not higher) rate of improvement, doubling every 9-12 months, or so. Unfortunately, applications that communicate frequently using standard protocols like TCP/IP do not seem to improve at similar rates
On caching search engine query results
In this paper we explore the problem of Caching of Search Engine Query Results in order to reduce the computing and I/O requirements needed to support the functionality of a search engine of the World-Wide Web. We study query traces from the EXCITE search engine and show that they have a significant amount of temporal locality: that is, a significant percentage of the queries have been submitted more than once by the same or a different user. Using trace-driven simulation we demonstrate that medium-size caches can hold the results of most of the frequently-submitted queries. Finally, we compare the effectiveness of static and dynamic caching and conclude that although dynamic caching can use large caches more effectively, static caching can perform better for (very) small caches.
Speeding up TCP/IP: Faster Processors are not Enough
Over the last decade we have been witnessing a significant increase in the capabilities of our computing and communication systems. On the one hand, processor speeds have been increasing exponentially, doubling every 18 months or so, while network bandwidth, has followed a similar (if not higher) rate of improvement, doubling every 9-12 months, or so. Unfortunately, applications that communicate frequently using standard protocols like TCP/IP do not seem to improve at similar rates
Visualizing Working Sets
Introduction It is widely known that most applications exhibit locality of reference. That is, applications access only a subset of their pages during any phase of their execution. This subset of pages is usually called the working set of the application. In this note we present the working set of applications in pictorial form so that it can be easily viewed and understood. Based on these working set "pictures' we make observations about the size, the duration, and the regularity of the working sets of various applications. Our applications cover several domains, ranging from numerical applications, program development tools, CAD simulations, and database applications. Our results suggest that most numerical and some database applications have regular access patterns and good locality of reference. Although most database and program development applications seem to have little locality of reference, careful observations at the appropriate granularity reveal r
- …